Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 168
Filtrar
1.
ACR Open Rheumatol ; 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38747148

RESUMO

OBJECTIVE: We aimed to examine the feasibility of applying natural language processing (NLP) to unstructured electronic health record (EHR) documents to detect the presence of financial insecurity among patients with rheumatologic disease enrolled in an integrated care management program (iCMP). METHODS: We incorporated supervised, rule-based NLP and statistical methods to identify financial insecurity among patients with rheumatic conditions enrolled in an iCMP (n = 20,395) in a multihospital EHR system. We constructed a lexicon for financial insecurity using data from available knowledge sources and then reviewed EHR notes from 538 randomly selected individuals (training cohort n = 366, validation cohort n = 172). We manually categorized records as having "definite," "possible," or "no" mention of financial insecurity. All available notes were processed using Narrative Information Linear Extraction, a rule-based version of NLP. Models were trained using the NLP features for financial insecurity using logistic, least absolute shrinkage operator (LASSO), and random forest performance characteristic and were compared with the reference standard. RESULTS: A total of 245,142 notes were processed from 538 individual patient records. Financial insecurity was present among 100 (27%) individuals in the training cohort and 63 (37%) in the validation cohort. The LASSO and random forest models performed identically and slightly better than logistic regression, with positive predictive values of 0.90, sensitivities of 0.29, and specificities of 0.98. CONCLUSION: The development of a context-driven lexicon used with rule-based NLP to extract data that identify financial insecurity is feasible for use and improved the capture for presence of financial insecurity with high accuracy. In the absence of a standard lexicon and construct definition for financial insecurity status, additional studies are needed to optimize the sensitivity of algorithms to categorize financial insecurity with construct validity.

2.
Online J Public Health Inform ; 16: e53445, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38700929

RESUMO

BACKGROUND: Post-COVID-19 condition (colloquially known as "long COVID-19") characterized as postacute sequelae of SARS-CoV-2 has no universal clinical case definition. Recent efforts have focused on understanding long COVID-19 symptoms, and electronic health record (EHR) data provide a unique resource for understanding this condition. The introduction of the International Classification of Diseases, Tenth Revision (ICD-10) code U09.9 for "Post COVID-19 condition, unspecified" to identify patients with long COVID-19 has provided a method of evaluating this condition in EHRs; however, the accuracy of this code is unclear. OBJECTIVE: This study aimed to characterize the utility and accuracy of the U09.9 code across 3 health care systems-the Veterans Health Administration, the Beth Israel Deaconess Medical Center, and the University of Pittsburgh Medical Center-against patients identified with long COVID-19 via a chart review by operationalizing the World Health Organization (WHO) and Centers for Disease Control and Prevention (CDC) definitions. METHODS: Patients who were COVID-19 positive with either a U07.1 ICD-10 code or positive polymerase chain reaction test within these health care systems were identified for chart review. Among this cohort, we sampled patients based on two approaches: (1) with a U09.9 code and (2) without a U09.9 code but with a new onset long COVID-19-related ICD-10 code, which allows us to assess the sensitivity of the U09.9 code. To operationalize the long COVID-19 definition based on health agency guidelines, symptoms were grouped into a "core" cluster of 11 commonly reported symptoms among patients with long COVID-19 and an extended cluster that captured all other symptoms by disease domain. Patients having ≥2 symptoms persisting for ≥60 days that were new onset after their COVID-19 infection, with ≥1 symptom in the core cluster, were labeled as having long COVID-19 per chart review. The code's performance was compared across 3 health care systems and across different time periods of the pandemic. RESULTS: Overall, 900 patient charts were reviewed across 3 health care systems. The prevalence of long COVID-19 among the cohort with the U09.9 ICD-10 code based on the operationalized WHO definition was between 23.2% and 62.4% across these health care systems. We also evaluated a less stringent version of the WHO definition and the CDC definition and observed an increase in the prevalence of long COVID-19 at all 3 health care systems. CONCLUSIONS: This is one of the first studies to evaluate the U09.9 code against a clinical case definition for long COVID-19, as well as the first to apply this definition to EHR data using a chart review approach on a nationwide cohort across multiple health care systems. This chart review approach can be implemented at other EHR systems to further evaluate the utility and performance of the U09.9 code.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38652572

RESUMO

OBJECTIVES: Rheumatoid arthritis (RA) and atherosclerosis share many common inflammatory pathways. We studied whether a multi-biomarker panel for RA disease activity (MBDA) would associate with changes in arterial inflammation in an interventional trial. METHODS: In the TARGET Trial, RA patients with active disease despite methotrexate were randomly assigned to the addition of either a TNF inhibitor or sulfasalazine+hydroxychloroquine (triple therapy). Baseline and 24-week follow-up 18F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography scans were assessed for change in arterial inflammation measured as the maximal arterial target-to-blood background ratio of FDG uptake in the most diseased segment of the carotid arteries or aorta (MDS-TBRmax). The MBDA test, measured at baseline and weeks 6, 18, and 24, was assessed for its association with the change in MDS-TBRmax. RESULTS: Interpretable scans were available at baseline and week 24 for n = 112 patients. The MBDA score at week 24 was significantly correlated with the change in MDR-TBRmax (Spearman's rho = 0.239; p= 0.011) and remained significantly associated after adjustment for relevant confounders. Those with low MBDA at week 24 had a statistically significant adjusted reduction in arterial inflammation of 0.35 units vs no significant reduction in those who did not achieve low MBDA. Neither DAS28-CRP nor CRP predicted change in arterial inflammation. The MBDA component with the strongest association with change in arterial inflammation was serum amyloid A (SAA). CONCLUSIONS: Among treated RA patients, achieved MBDA predicts of changes in arterial inflammation. Achieving low MBDA at 24 weeks was associated with clinically meaningful reductions in arterial inflammation, regardless of treatment.

4.
J Am Heart Assoc ; 13(9): e030387, 2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38686879

RESUMO

BACKGROUND: Coronary microvascular dysfunction as measured by myocardial flow reserve (MFR) is associated with increased cardiovascular risk in rheumatoid arthritis (RA). The objective of this study was to determine the association between reducing inflammation with MFR and other measures of cardiovascular risk. METHODS AND RESULTS: Patients with RA with active disease about to initiate a tumor necrosis factor inhibitor were enrolled (NCT02714881). All subjects underwent a cardiac perfusion positron emission tomography scan to quantify MFR at baseline before tumor necrosis factor inhibitor initiation, and after tumor necrosis factor inhibitor initiation at 24 weeks. MFR <2.5 in the absence of obstructive coronary artery disease was defined as coronary microvascular dysfunction. Blood samples at baseline and 24 weeks were measured for inflammatory markers (eg, high-sensitivity C-reactive protein [hsCRP], interleukin-1b, and high-sensitivity cardiac troponin T [hs-cTnT]). The primary outcome was mean MFR before and after tumor necrosis factor inhibitor initiation, with Δhs-cTnT as the secondary outcome. Secondary and exploratory analyses included the correlation between ΔhsCRP and other inflammatory markers with MFR and hs-cTnT. We studied 66 subjects, 82% of which were women, mean RA duration 7.4 years. The median atherosclerotic cardiovascular disease risk was 2.5%; 47% had coronary microvascular dysfunction and 23% had detectable hs-cTnT. We observed no change in mean MFR before (2.65) and after treatment (2.64, P=0.6) or hs-cTnT. A correlation was observed between a reduction in hsCRP and interleukin-1b with a reduction in hs-cTnT. CONCLUSIONS: In this RA cohort with low prevalence of cardiovascular risk factors, nearly 50% of subjects had coronary microvascular dysfunction at baseline. A reduction in inflammation was not associated with improved MFR. However, a modest reduction in interleukin-1b and no other inflammatory pathways was correlated with a reduction in subclinical myocardial injury. REGISTRATION: URL: https://www.clinicaltrials.gov; Unique identifier: NCT02714881.


Assuntos
Artrite Reumatoide , Biomarcadores , Circulação Coronária , Inflamação , Microcirculação , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Antirreumáticos/uso terapêutico , Artrite Reumatoide/fisiopatologia , Artrite Reumatoide/complicações , Artrite Reumatoide/sangue , Biomarcadores/sangue , Proteína C-Reativa/metabolismo , Doença da Artéria Coronariana/fisiopatologia , Doença da Artéria Coronariana/sangue , Doença da Artéria Coronariana/diagnóstico , Circulação Coronária/fisiologia , Vasos Coronários/fisiopatologia , Vasos Coronários/diagnóstico por imagem , Reserva Fracionada de Fluxo Miocárdico/fisiologia , Fatores de Risco de Doenças Cardíacas , Inflamação/sangue , Inflamação/fisiopatologia , Mediadores da Inflamação/sangue , Interleucina-1beta/sangue , Imagem de Perfusão do Miocárdio/métodos , Tomografia por Emissão de Pósitrons , Resultado do Tratamento , Troponina T/sangue , Inibidores do Fator de Necrose Tumoral/uso terapêutico
5.
Sci Rep ; 14(1): 8021, 2024 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-38580710

RESUMO

The Phenome-Wide Association Study (PheWAS) is increasingly used to broadly screen for potential treatment effects, e.g., IL6R variant as a proxy for IL6R antagonists. This approach offers an opportunity to address the limited power in clinical trials to study differential treatment effects across patient subgroups. However, limited methods exist to efficiently test for differences across subgroups in the thousands of multiple comparisons generated as part of a PheWAS. In this study, we developed an approach that maximizes the power to test for heterogeneous genotype-phenotype associations and applied this approach to an IL6R PheWAS among individuals of African (AFR) and European (EUR) ancestries. We identified 29 traits with differences in IL6R variant-phenotype associations, including a lower risk of type 2 diabetes in AFR (OR 0.96) vs EUR (OR 1.0, p-value for heterogeneity = 8.5 × 10-3), and higher white blood cell count (p-value for heterogeneity = 8.5 × 10-131). These data suggest a more salutary effect of IL6R blockade for T2D among individuals of AFR vs EUR ancestry and provide data to inform ongoing clinical trials targeting IL6 for an expanding number of conditions. Moreover, the method to test for heterogeneity of associations can be applied broadly to other large-scale genotype-phenotype screens in diverse populations.


Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/genética , Estudos de Associação Genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Receptores de Interleucina-6/genética
6.
J Am Med Inform Assoc ; 31(5): 1126-1134, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38481028

RESUMO

OBJECTIVE: Development of clinical phenotypes from electronic health records (EHRs) can be resource intensive. Several phenotype libraries have been created to facilitate reuse of definitions. However, these platforms vary in target audience and utility. We describe the development of the Centralized Interactive Phenomics Resource (CIPHER) knowledgebase, a comprehensive public-facing phenotype library, which aims to facilitate clinical and health services research. MATERIALS AND METHODS: The platform was designed to collect and catalog EHR-based computable phenotype algorithms from any healthcare system, scale metadata management, facilitate phenotype discovery, and allow for integration of tools and user workflows. Phenomics experts were engaged in the development and testing of the site. RESULTS: The knowledgebase stores phenotype metadata using the CIPHER standard, and definitions are accessible through complex searching. Phenotypes are contributed to the knowledgebase via webform, allowing metadata validation. Data visualization tools linking to the knowledgebase enhance user interaction with content and accelerate phenotype development. DISCUSSION: The CIPHER knowledgebase was developed in the largest healthcare system in the United States and piloted with external partners. The design of the CIPHER website supports a variety of front-end tools and features to facilitate phenotype development and reuse. Health data users are encouraged to contribute their algorithms to the knowledgebase for wider dissemination to the research community, and to use the platform as a springboard for phenotyping. CONCLUSION: CIPHER is a public resource for all health data users available at https://phenomics.va.ornl.gov/ which facilitates phenotype reuse, development, and dissemination of phenotyping knowledge.


Assuntos
Registros Eletrônicos de Saúde , Fenômica , Fenótipo , Bases de Conhecimento , Algoritmos
7.
Biometrics ; 80(1)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38465982

RESUMO

In many modern machine learning applications, changes in covariate distributions and difficulty in acquiring outcome information have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics, especially receiver operating characteristic (ROC) parameters, of a trained model. In this paper, we aim to evaluate the performance of a trained binary classifier on unlabeled target population based on ROC analysis. We proposed Semisupervised Transfer lEarning of Accuracy Measures (STEAM), an efficient three-step estimation procedure that employs (1) double-index modeling to construct calibrated density ratio weights and (2) robust imputation to leverage the large amount of unlabeled data to improve estimation efficiency. We establish the consistency and asymptotic normality of the proposed estimator under the correct specification of either the density ratio model or the outcome model. We also correct for potential overfitting bias in the estimators in finite samples with cross-validation. We compare our proposed estimators to existing methods and show reductions in bias and gains in efficiency through simulations. We illustrate the practical utility of the proposed method on evaluating prediction performance of a phenotyping model for rheumatoid arthritis (RA) on a temporally evolving EHR cohort.


Assuntos
Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Humanos , Curva ROC , Projetos de Pesquisa , Viés
8.
Semin Arthritis Rheum ; 66: 152421, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38457949

RESUMO

OBJECTIVE: Switching biologic and targeted synthetic DMARD (b/tsDMARD) medications occurs commonly in RA patients, however data are limited on the reasons for these changes. The objective of the study was to identify and categorize reasons for b/tsDMARD switching and investigate characteristics associated with treatment refractory RA. METHODS: In a multi-hospital RA electronic health record (EHR) cohort, we identified RA patients prescribed ≥1 b/tsDMARD between 2001 and 2017. Consistent with the EULAR "difficult to treat" (D2T) RA definition, we further identified patients who discontinued ≥2 b/tsDMARDs with different mechanisms of action. We performed manual chart review to determine reasons for medication discontinuation. We defined "treatment refractory" RA as not achieving low disease activity (<3 tender or swollen joints on <7.5 mg of daily prednisone equivalent) despite treatment with two different b/tsDMARD mechanisms of action. We compared demographic, lifestyle, and clinical factors between treatment refractory RA and b/tsDMARD initiators not meeting D2T criteria. RESULTS: We identified 6040 RA patients prescribed ≥1 b/tsDMARD including 404 meeting D2T criteria. The most common reasons for medication discontinuation were inadequate response (43.3 %), loss of efficacy (25.8 %), and non-allergic adverse events (13.7 %). Of patients with D2T RA, 15 % had treatment refractory RA. Treatment refractory RA patients were younger at b/tsDMARD initiation (mean 47.2 vs. 55.2 years, p < 0.001), more commonly female (91.8% vs. 76.1 %, p = 0.006), and ever smokers (68.9% vs. 49.9 %, p = 0.005). No RA clinical factors differentiated treatment refractory RA patients from b/tsDMARD initiators. CONCLUSIONS: In a large EHR-based RA cohort, the most common reasons for b/tsDMARD switching were inadequate response, loss of efficacy, and nonallergic adverse events (e.g. infections, leukopenia, psoriasis). Clinical RA factors were insufficient for differentiating b/tsDMARD responders from nonresponders.


Assuntos
Antirreumáticos , Artrite Reumatoide , Produtos Biológicos , Substituição de Medicamentos , Humanos , Artrite Reumatoide/tratamento farmacológico , Feminino , Masculino , Pessoa de Meia-Idade , Antirreumáticos/uso terapêutico , Produtos Biológicos/uso terapêutico , Idoso , Adulto
9.
J Am Heart Assoc ; 13(5): e032095, 2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38416140

RESUMO

Cardiovascular disease remains an important comorbidity in patients with rheumatoid arthritis (RA), but traditional models do not accurately predict cardiovascular risk in patients with RA. The addition of biomarkers could improve prediction. METHODS AND RESULTS: The TARGET (Treatments Against RA and Effect on FDG PET/CT) trial assessed whether different treatment strategies in RA differentially impact cardiovascular risk as measured by the change in arterial inflammation on arterial target to background ratio on fluorodeoxyglucose positron emission tomography/computed tomography scans conducted 24 weeks apart. A group of 24 candidate biomarkers supported by prior literature was assessed at baseline and 24 weeks later. Longitudinal analyses examined the association between baseline biomarker values, measured in plasma EDTA, and the change in arterial inflammation target to background ratio. Model fit was assessed for the candidate biomarkers only, clinical variables only, and models combining both. One hundred nine patients with median (interquartile range) age 58 years (53-65 years), RA duration 1.4 years (0.5-6.6 years), and 82% women had biomarkers assessed at baseline and follow-up. Because the main trial analyses demonstrated significant target to background ratio decreases with both treatment strategies but no difference across treatment groups, we analyzed all patients together. Baseline values of serum amyloid A, C-reactive protein, soluble tumor necrosis factor receptor 1, adiponectin, YKL-40, and osteoprotegerin were associated with significant change in target to background ratio. When selected candidate biomarkers were added to the clinical variables, the adjusted R2 improved from 0.20 to 0.33 (likelihood ratio P=0.0005). CONCLUSIONS: A candidate biomarker approach identified several promising biomarkers that associate with baseline and treatment-associated changes in arterial inflammation in patients with RA. These will now be tested in an external validation cohort.


Assuntos
Arterite , Artrite Reumatoide , Doenças Cardiovasculares , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Arterite/complicações , Artrite Reumatoide/complicações , Artrite Reumatoide/diagnóstico , Artrite Reumatoide/tratamento farmacológico , Biomarcadores , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/etiologia , Fatores de Risco de Doenças Cardíacas , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada/métodos , Fatores de Risco , Idoso
10.
Patterns (N Y) ; 5(1): 100906, 2024 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-38264714

RESUMO

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

11.
bioRxiv ; 2024 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-37503080

RESUMO

Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across diverse biological contexts, such as tissues and cell types, remains a significant challenge for existing algorithms. We introduce Pinnacle, a flexible geometric deep learning approach that is trained on contextualized protein interaction networks to generate context-aware protein representations. Leveraging a human multi-organ single-cell transcriptomic atlas, Pinnacle provides 394,760 protein representations split across 156 cell type contexts from 24 tissues and organs. Pinnacle's contextualized representations of proteins reflect cellular and tissue organization and Pinnacle's tissue representations enable zero-shot retrieval of the tissue hierarchy. Pretrained Pinnacle's protein representations can be adapted for downstream tasks: to enhance 3D structure-based protein representations for important protein interactions in immuno-oncology (PD-1/PD-L1 and B7-1/CTLA-4) and to study the effects of drugs across cell type contexts. Pinnacle outperforms state-of-the-art, yet context-free, models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases, and can pinpoint cell type contexts that predict therapeutic targets better than context-free models (29 out of 156 cell types in rheumatoid arthritis; 13 out of 152 cell types in inflammatory bowel diseases). Pinnacle is a graph-based contextual AI model that dynamically adjusts its outputs based on biological contexts in which it operates.

12.
Arthritis Rheumatol ; 76(3): 356-362, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37791989

RESUMO

OBJECTIVE: Recent studies have uncovered diverse cell types and states in the rheumatoid arthritis (RA) synovium; however, limited data exist correlating these findings with patient-level clinical information. Using the largest cohort to date with clinical and multicell data, we determined associations between RA clinical factors with cell types and states in the RA synovium. METHODS: The Accelerated Medicines Partnership Rheumatoid Arthritis study recruited patients with active RA who were not receiving disease-modifying antirheumatic drugs (DMARDs) or who had an inadequate response to methotrexate (MTX) or tumor necrosis factor inhibitors. RA clinical factors were systematically collected. Biopsies were performed on an inflamed joint, and tissue were disaggregated and processed with a cellular indexing of transcriptomes and epitopes sequencing pipeline from which the following cell type percentages and cell type abundance phenotypes (CTAPs) were derived: endothelial, fibroblast, and myeloid (EFM); fibroblasts; myeloid; T and B cells; T cells and fibroblasts (TF); and T and myeloid cells. Correlations were measured between RA clinical factors, cell type percentage, and CTAPs. RESULTS: We studied 72 patients (mean age 57 years, 75% women, 83% seropositive, mean RA duration 6.6 years, mean Disease Activity Score-28 C-reactive Protein 3 [DAS28-CRP3] score 4.8). Higher DAS28-CRP3 correlated with a higher T cell percentage (P < 0.01). Those receiving MTX and not a biologic DMARD (bDMARD) had a higher percentage of B cells versus those receiving no DMARDs (P < 0.01). Most of those receiving bDMARDs were categorized as EFM (57%), whereas none were TF. No significant difference was observed across CTAPs for age, sex, RA disease duration, or DAS28-CRP3. CONCLUSION: In this comprehensive screen of clinical factors, we observed differential associations between DMARDs and cell phenotypes, suggesting that RA therapies, more than other clinical factors, may impact cell type/state in the synovium and ultimately influence response to subsequent therapies.


Assuntos
Antirreumáticos , Artrite Reumatoide , Humanos , Feminino , Pessoa de Meia-Idade , Masculino , Antirreumáticos/uso terapêutico , Metotrexato/uso terapêutico , Artrite Reumatoide/tratamento farmacológico , Membrana Sinovial , Fator Reumatoide
13.
Pharmacoepidemiol Drug Saf ; 33(1): e5684, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37654015

RESUMO

BACKGROUND: We aimed to determine whether integrating concepts from the notes from the electronic health record (EHR) data using natural language processing (NLP) could improve the identification of gout flares. METHODS: Using Medicare claims linked with EHR, we selected gout patients who initiated the urate-lowering therapy (ULT). Patients' 12-month baseline period and on-treatment follow-up were segmented into 1-month units. We retrieved EHR notes for months with gout diagnosis codes and processed notes for NLP concepts. We selected a random sample of 500 patients and reviewed each of their notes for the presence of a physician-documented gout flare. Months containing at least 1 note mentioning gout flares were considered months with events. We used 60% of patients to train predictive models with LASSO. We evaluated the models by the area under the curve (AUC) in the validation data and examined positive/negative predictive values (P/NPV). RESULTS: We extracted and labeled 839 months of follow-up (280 with gout flares). The claims-only model selected 20 variables (AUC = 0.69). The NLP concept-only model selected 15 (AUC = 0.69). The combined model selected 32 claims variables and 13 NLP concepts (AUC = 0.73). The claims-only model had a PPV of 0.64 [0.50, 0.77] and an NPV of 0.71 [0.65, 0.76], whereas the combined model had a PPV of 0.76 [0.61, 0.88] and an NPV of 0.71 [0.65, 0.76]. CONCLUSION: Adding NLP concept variables to claims variables resulted in a small improvement in the identification of gout flares. Our data-driven claims-only model and our combined claims/NLP-concept model outperformed existing rule-based claims algorithms reliant on medication use, diagnosis, and procedure codes.


Assuntos
Gota , Idoso , Humanos , Estados Unidos/epidemiologia , Gota/diagnóstico , Gota/epidemiologia , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Medicare , Exacerbação dos Sintomas , Algoritmos
14.
Neurol Genet ; 10(1): e200110, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38130828

RESUMO

Background and Objectives: Nearly all genetic analyses of Parkinson disease (PD) have been in populations of European ancestry. We sought to test the ability of a machine learning method to extract accurate PD diagnoses from an electronic medical record (EMR) system, to see whether genetic variants identified in European populations generalize to individuals of African and Hispanic ancestries, and to compare the rates of PD across ancestries. Methods: A machine learning method using natural language processing was applied to EMRs of US veterans participating in the VA Million Veteran Program (MVP) to identify individuals with PD. These putative cases were vetted via blind chart review by a movement disorder specialist. A polygenic risk score (PRS) of 90 established genetic variants whose genotypes were imputed from a customized Axiom Biobank Array was evaluated in different case groups. Results: The EMR prediction scores had a distinct trimodal distribution, with 97% of the high group and only 30% of the middle group having a credible diagnosis of PD. Using the 3,542 cases from the high group matched 4:1 to controls, the PRS was highly predictive in individuals of European ancestry (n = 3,137 cases; OR = 1.82; p = 8.01E-48), and nearly identical effect sizes were seen in individuals of African (n = 184; OR = 2.07; p = 3.4E-4) and Hispanic ancestries (n = 221; OR = 2.13; p = 3.9E-6). The PRS was much less predictive for the 2,757 European ancestry cases who had an ICD code for PD but for whom the machine learning method had a lower confidence in their diagnosis. No novel ancestry-specific genetic variants were identified. Individuals with African ancestry had one-quarter the rate of PD compared with European or Hispanic ancestries aged 60-70 years and one half the rate in the 70-80 years age range. African American cases had a higher proportion of their DNA originating in Europe compared with African American controls. Discussion: Machine learning can reliably classify PD using data from a large EMR. Larger studies of non-European populations are required to confirm the generalizability of PD risk variants identified in populations of European ancestry and the increased risk coming from a higher proportion of European DNA in African Americans.

16.
Nature ; 623(7987): 616-624, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37938773

RESUMO

Rheumatoid arthritis is a prototypical autoimmune disease that causes joint inflammation and destruction1. There is currently no cure for rheumatoid arthritis, and the effectiveness of treatments varies across patients, suggesting an undefined pathogenic diversity1,2. Here, to deconstruct the cell states and pathways that characterize this pathogenic heterogeneity, we profiled the full spectrum of cells in inflamed synovium from patients with rheumatoid arthritis. We used multi-modal single-cell RNA-sequencing and surface protein data coupled with histology of synovial tissue from 79 donors to build single-cell atlas of rheumatoid arthritis synovial tissue that includes more than 314,000 cells. We stratified tissues into six groups, referred to as cell-type abundance phenotypes (CTAPs), each characterized by selectively enriched cell states. These CTAPs demonstrate the diversity of synovial inflammation in rheumatoid arthritis, ranging from samples enriched for T and B cells to those largely lacking lymphocytes. Disease-relevant cell states, cytokines, risk genes, histology and serology metrics are associated with particular CTAPs. CTAPs are dynamic and can predict treatment response, highlighting the clinical utility of classifying rheumatoid arthritis synovial phenotypes. This comprehensive atlas and molecular, tissue-based stratification of rheumatoid arthritis synovial tissue reveal new insights into rheumatoid arthritis pathology and heterogeneity that could inform novel targeted treatments.


Assuntos
Artrite Reumatoide , Humanos , Artrite Reumatoide/complicações , Artrite Reumatoide/genética , Artrite Reumatoide/imunologia , Artrite Reumatoide/patologia , Citocinas/metabolismo , Inflamação/complicações , Inflamação/genética , Inflamação/imunologia , Inflamação/patologia , Membrana Sinovial/patologia , Linfócitos T/imunologia , Linfócitos B/imunologia , Predisposição Genética para Doença/genética , Fenótipo , Análise da Expressão Gênica de Célula Única
17.
medRxiv ; 2023 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-37873131

RESUMO

Though electronic health record (EHR) systems are a rich repository of clinical information with large potential, the use of EHR-based phenotyping algorithms is often hindered by inaccurate diagnostic records, the presence of many irrelevant features, and the requirement for a human-labeled training set. In this paper, we describe a knowledge-driven online multimodal automated phenotyping (KOMAP) system that i) generates a list of informative features by an online narrative and codified feature search engine (ONCE) and ii) enables the training of a multimodal phenotyping algorithm based on summary data. Powered by composite knowledge from multiple EHR sources, online article corpora, and a large language model, features selected by ONCE show high concordance with the state-of-the-art AI models (GPT4 and ChatGPT) and encourage large-scale phenotyping by providing a smaller but highly relevant feature set. Validation of the KOMAP system across four healthcare centers suggests that it can generate efficient phenotyping algorithms with robust performance. Compared to other methods requiring patient-level inputs and gold-standard labels, the fully online KOMAP provides a significant opportunity to enable multi-center collaboration.

18.
medRxiv ; 2023 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-37293026

RESUMO

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes, covering hundreds of thousands of clinical concepts available for research and clinical care. The complex, massive, heterogeneous, and noisy nature of EHR data imposes significant challenges for feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: The ARCH algorithm first derives embedding vectors from a co-occurrence matrix of all EHR concepts and then generates cosine similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. In the final step, ARCH performs a sparse embedding regression to remove indirect linkage between entity pairs. We validated the clinical utility of the ARCH knowledge graph, generated from 12.5 million patients in the Veterans Affairs (VA) healthcare system, through downstream tasks including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 EHR concepts, as visualized in the R-shiny powered web-API (https://celehs.hms.harvard.edu/ARCH/). The ARCH embeddings attained an average area under the ROC curve (AUC) of 0.926 and 0.861 for detecting pairs of similar EHR concepts when the concepts are mapped to codified data and to NLP data; and 0.810 (codified) and 0.843 (NLP) for detecting related pairs. Based on the p-values computed by ARCH, the sensitivity of detecting similar and related entity pairs are 0.906 and 0.888 under false discovery rate (FDR) control of 5%. For detecting drug side effects, the cosine similarity based on the ARCH semantic representations achieved an AUC of 0.723 while the AUC improved to 0.826 after few-shot training via minimizing the loss function on the training data set. Incorporating NLP data substantially improved the ability to detect side effects in the EHR. For example, based on unsupervised ARCH embeddings, the power of detecting drug-side effects pairs when using codified data only was 0.15, much lower than the power of 0.51 when using both codified and NLP concepts. Compared to existing large-scale representation learning methods including PubmedBERT, BioBERT and SAPBERT, ARCH attains the most robust performance and substantially higher accuracy in detecting these relationships. Incorporating ARCH selected features in weakly supervised phenotyping algorithms can improve the robustness of algorithm performance, especially for diseases that benefit from NLP features as supporting evidence. For example, the phenotyping algorithm for depression attained an AUC of 0.927 when using ARCH selected features but only 0.857 when using codified features selected via the KESER network[1]. In addition, embeddings and knowledge graphs generated from the ARCH network were able to cluster AD patients into two subgroups, where the fast progression subgroup had a much higher mortality rate. Conclusions: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.

19.
Arthritis Res Ther ; 25(1): 93, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37269020

RESUMO

BACKGROUND: Many patients with rheumatoid arthritis (RA) require a trial of multiple biologic disease-modifying anti-rheumatic drugs (bDMARDs) to control their disease. With the availability of several bDMARD options, the history of bDMARDs may provide an alternative approach to understanding subphenotypes of RA. The objective of this study was to determine whether there exist distinct clusters of RA patients based on bDMARD prescription history to subphenotype RA. METHODS: We studied patients from a validated electronic health record-based RA cohort with data from January 1, 2008, through July 31, 2019; all subjects prescribed ≥ 1 bDMARD or targeted synthetic (ts) DMARD were included. To determine whether subjects had similar b/tsDMARD sequences, the sequences were considered as a Markov chain over the state-space of 5 classes of b/tsDMARDs. The maximum likelihood estimator (MLE)-based approach was used to estimate the Markov chain parameters to determine the clusters. The EHR data of study subjects were further linked with a registry containing prospectively collected data for RA disease activity, i.e., clinical disease activity index (CDAI). As a proof of concept, we tested whether the clusters derived from b/tsDMARD sequences correlated with clinical measures, specifically differing trajectories of CDAI. RESULTS: We studied 2172 RA subjects, mean age 52 years, RA duration 3.4 years, and 62% seropositive. We observed 550 unique b/tsDMARD sequences and identified 4 main clusters: (1) TNFi persisters (65.7%), (2) TNFi and abatacept therapy (8.0%), (3) on rituximab or multiple b/tsDMARDs (12.7%), (4) prescribed multiple therapies with tocilizumab predominant (13.6%). Compared to the other groups, TNFi persisters had the most favorable trajectory of CDAI over time. CONCLUSION: We observed that RA subjects can be clustered based on the sequence of b/tsDMARD prescriptions over time and that the clusters were correlated with differing trajectories of disease activity over time. This study highlights an alternative approach to consider subphenotyping of patients with RA for studies aimed at understanding treatment response.


Assuntos
Antirreumáticos , Artrite Reumatoide , Produtos Biológicos , Humanos , Pessoa de Meia-Idade , Artrite Reumatoide/tratamento farmacológico , Antirreumáticos/uso terapêutico , Rituximab/uso terapêutico , Abatacepte/uso terapêutico , Produtos Biológicos/uso terapêutico
20.
J Biomed Inform ; 144: 104425, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37331495

RESUMO

OBJECTIVE: Electronic health records (EHR), containing detailed longitudinal clinical information on a large number of patients and covering broad patient populations, open opportunities for comprehensive predictive modeling of disease progression and treatment response. However, since EHRs were originally constructed for administrative purposes not for research, in the EHR-linked studies, it is often not feasible to capture reliable information for analytical variables, especially in the survival setting, when both accurate event status and event times are needed for model building. For example, progression-free survival (PFS), a commonly used survival outcome for cancer patients, often involves complex information embedded in free-text clinical notes and cannot be extracted reliably. Proxies of PFS time such as time to the first mention of progression in the notes are at best good approximations to the true event time. This leads to difficulty in efficiently estimating event rates for an EHR patient cohort. Estimating survival rates based on error-prone outcome definitions can lead to biased results and hamper the power in the downstream analysis. On the other hand, extracting accurate event time information via manual annotation is time and resource intensive. The objective of this study is to develop a calibrated survival rate estimator using noisy outcomes from EHR data. MATERIALS AND METHODS: In this paper, we propose a two-stage semi-supervised calibration of noisy event rate (SCANER) estimator that can effectively overcome censoring induced dependency and attains more robust performance (i.e., not sensitive to misspecification of the imputation model) by fully utilizing both a small-labeled set of gold-standard survival outcomes annotated via manual chart review and a set of proxy features automatically captured via EHR in the unlabeled set. We validate the SCANER estimator by estimating the PFS rates for a virtual cohort of lung cancer patients from one large tertiary care center and the ICU-free survival rates for COVID patients from two large tertiary care centers. RESULTS: In terms of survival rate estimates, the SCANER had very similar point estimates compared to the complete-case Kaplan Meier estimator. On the other hand, other benchmark methods for comparison, which fail to account for the induced dependency between event time and the censoring time conditioning on surrogate outcomes, produced biased results across all three case studies. In terms of standard errors, the SCANER estimator was more efficient than the KM estimator, with up to 50% efficiency gain. CONCLUSION: The SCANER estimator achieves more efficient, robust, and accurate survival rate estimates compared to existing approaches. This promising new approach can also improve the resolution (i.e., granularity of event time) by using labels conditioning on multiple surrogates, particularly among less common or poorly coded conditions.


Assuntos
COVID-19 , Neoplasias Pulmonares , Humanos , Registros Eletrônicos de Saúde , Calibragem , Análise de Sobrevida
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA